This data set contains transcribed high-quality audio of English sentences
recorded by volunteers speaking different dialects of the language. The data set
consists of wave files, and a TSV file (line_index.tsv). The file line_index.csv
contains a line id, an anonymized FileID and the transcription of audio in the
file.

The recordings from the Welsh English speakers were collected in collaboration
with Cardiff University.

The data set contains the following number of lines:<br>
Irish English male: 450<br>
Midlands English female: 246<br>
Midlands English male: 450<br>
Northern English female: 750<br>
Northern English male: 2097<br>
Scottish English female: 894<br>
Scottish English male: 1649<br>
Southern English female: 4161<br>
Southern English male: 4331<br>
Welsh English female: 1199<br>
Welsh English male: 1650<br>

<p>
The data set has been manually quality checked, but there might still be errors.
<p>
Please report any issues in the following issue tracker on GitHub.
<a href="https://github.com/googlei18n/language-resources/issues">
  https://github.com/googlei18n/language-resources/issues
</a>
<p>
See LICENSE file for license information.
<p>
Copyright 2018, 2019 Google, Inc.
<p>
If you use this data in publications, please cite it as follows:
<pre>
  @inproceedings{demirsahin-etal-2020-open,
    title = {{Open-source Multi-speaker Corpora of the English Accents in the British Isles}},
    author = {Demirsahin, Isin and Kjartansson, Oddur and Gutkin, Alexander and Rivera, Clara},
    booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference (LREC)},
    month = may,
    year = {2020},
    pages = {6532--6541},
    address = {Marseille, France},
    publisher = {European Language Resources Association (ELRA)},
    url = {https://www.aclweb.org/anthology/2020.lrec-1.804},
    ISBN = {979-10-95546-34-4},
  }
</pre>
